Learning preferences for manipulation tasks from online coactive feedback
نویسندگان
چکیده
We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than simple geometric constraints on trajectories; they are rather governed by the surrounding context of various objects and human interactions in the environment. We propose a coactive online learning framework for teaching preferences in contextually rich environments. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this coactive preference feedback can be more easily elicited than demonstrations of optimal trajectories. Nevertheless, theoretical regret bounds of our algorithm match the asymptotic rates of optimal trajectory algorithms. We implement our algorithm on two high-degree-of-freedom robots, PR2 and Baxter, and present three intuitive mechanisms for providing such incremental feedback. In our experimental evaluation we consider two context rich settings, household chores and grocery store checkout, and show that users are able to train the robot with just a few feedbacks (taking only a few minutes).
منابع مشابه
Stable Coactive Learning via Perturbation
Coactive Learning is a model of interaction between a learning system (e.g. search engine) and its human users, wherein the system learns from (typically implicit) user feedback during operational use. User feedback takes the form of preferences, and recent work has introduced online algorithms that learn from this weak feedback. However, we show that these algorithms can be unstable and ineffe...
متن کاملLearning Trajectory Preferences for Manipulators via Iterative Improvement
We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback ex...
متن کاملCoactive Critiquing: Elicitation of Preferences and Features
When faced with complex choices, users refine their own preference criteria as they explore the catalogue of options. In this paper we propose an approach to preference elicitation suited for this scenario. We extend Coactive Learning, which iteratively collects manipulative feedback, to optionally query example critiques. User critiques are integrated into the learning model by dynamically ext...
متن کاملA Coactive Learning View of Online Structured Prediction in Statistical Machine Translation
We present a theoretical analysis of online parameter tuning in statistical machine translation (SMT) from a coactive learning view. This perspective allows us to give regret and generalization bounds for latent perceptron algorithms that are common in SMT, but fall outside of the standard convex optimization scenario. Coactive learning also introduces the concept of weak feedback, which we app...
متن کاملCoactive Learning for Interactive Machine Translation
Coactive learning describes the interaction between an online structured learner and a human user who corrects the learner by responding with weak feedback, that is, with an improved, but not necessarily optimal, structure. We apply this framework to discriminative learning in interactive machine translation. We present a generalization to latent variable models and give regret and generalizati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- I. J. Robotics Res.
دوره 34 شماره
صفحات -
تاریخ انتشار 2015